Convergence Proofs of Least Squares Policy Iteration Algorithm for High-Dimensional Infinite Horizon Markov Decision Process Problems

نویسندگان

Jun Ma

Warren B. Powell

چکیده

Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinite-horizon Markov decision problems where the state and action spaces are continuous and the expectation cannot be computed exactly. We show that an appropriately designed least squares (LS) or recursive least squares (RLS) method is provably convergent under certain problem structure assumptions on value functions. In addition, we show that the LS/RLS approximate policy iteration algorithm converges in the mean, meaning that the mean error between the approximate policy value function and the optimal value function shrinks to zero as successive approximations become more accurate. Furthermore, the convergence results are extended to the more general case of unknown basis functions with orthogonal polynomials.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergence Analysis of On-Policy LSPI for Multi-Dimensional Continuous State and Action-Space MDPs and Extension with Orthogonal Polynomial Approximation

We propose an online, on-policy least-squares policy iteration (LSPI) algorithm which can be applied to infinite horizon problems with where states and controls are vector-valued and continuous. We do not use special structure such as linear, additive noise, and we assume that the expectation cannot be computed exactly. We use the concept of the post-decision state variable to eliminate the exp...

متن کامل

Risk-averse dynamic programming for Markov decision processes

We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we also develop a risk-averse policy iteration method and we pro...

متن کامل

Convergence Analysis of Kernel-based On-policy Approximate Policy Iteration Algorithms for Markov Decision Processes with Continuous, Multidimensional States and Actions

Using kernel smoothing techniques, we propose three different online, on-policy approximate policy iteration algorithms which can be applied to infinite horizon problems with continuous and vector-valued states and actions. Using Monte Carlo sampling to estimate the value function around the post-decision state, we reduce the problem to a sequence of deterministic, nonlinear programming problem...

متن کامل

Point-Based Policy Iteration

We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of init...

متن کامل

Convergence Results for Some Temporal Difference Methods Based on Least Squares Citation

We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE( ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Convergence Proofs of Least Squares Policy Iteration Algorithm for High-Dimensional Infinite Horizon Markov Decision Process Problems

نویسندگان

چکیده

منابع مشابه

Convergence Analysis of On-Policy LSPI for Multi-Dimensional Continuous State and Action-Space MDPs and Extension with Orthogonal Polynomial Approximation

Risk-averse dynamic programming for Markov decision processes

Convergence Analysis of Kernel-based On-policy Approximate Policy Iteration Algorithms for Markov Decision Processes with Continuous, Multidimensional States and Actions

Point-Based Policy Iteration

Convergence Results for Some Temporal Difference Methods Based on Least Squares Citation

عنوان ژورنال:

اشتراک گذاری